Implementing Spinlock for RISC-V OS in Rust

Spinlock is one of the most basic synchronization implementations and is one of the first components to consider when implementing an OS.

This article will briefly review the basics of spinlock, how to implement it for self-made OS in Rust, and its advantages over the C language.

All code examples are written in Rust unless otherwise noted.

The need for synchronization

When multiple processes run parallel and touch shared data, their exclusion control becomes problematic.
For example, suppose that each process executes its code that modifies the shared data. However, if the code blocks can be performed independently and unrestrictedly, the data is unexpected depending on the timing of the data read/write. This is called a race condition.

The code that causes the race condition is called the critical section. There are synchronization implementations such as spinlock to protect this critical section,

A spinlock uses loops and CPU atomic instructions to limit the number of processes that can execute the critical section to at most one. Note that the name "spin" comes from the looping.

In the Rust language, it is used in the following way.
Here, the spinlock is defined as a data structure called Mutex (MUTual EXclusion).


let m: Mutex<usize> = Mutex::new(5);
{
    // critical section
    let mut val = m.lock(); // aqcuire lock
    *val = 6;               
}                           // release lock

The Mutex wraps and protects the type usize data. Then, the protected data cannot be accessed unless the lock is acquired with lock(). It also waits in a loop if another process has acquired the lock. Then, the lock is released by drop when it goes out of scope.

Let's also look at an example of spinlock in C compared to Rust. In general, C uses a lock management structure or shared variable to manage the state of the lock, and functions such as acquire() and release() are used to acquire and release the lock.


struct lock vallock;
int m = 5;
aqcuire(&vallock);
// critical section
m = 6;
release(&vallock);

In C, there is always the problem of forgetting to release a lock because you have to insert code to acquire and release a lock. And it is also more dangerous than in Rust because you can access the shared data without acquiring the lock.

Now that we have an overview of spinlock let's look at spinlock from the perspective of atomic operation.

Atomic Operation

To put atomic operations in simple terms:

There is a process that combines operations that require multiple memory accesses, and the intermediate state of the process is not system observable. There are only two states: either the sequence of changes always succeeds or fails and there is no change.

The CPU supports instructions for atomic operations, which are also used to implement spinlock.

In the case of RISC-V, there are AMO (Atomic Memory Operation) and Load-Link/Store-Conditional instructions, both of which can perform or implement the following operations.

No interruptions are allowed between memory reads and writes by the instruction, and other processors cannot modify the memory value while the instruction is executing.

The atomic operation used in spinlock is TAS (Test and Set). Let's see what type and function are given for this operation in Rust compared to C.

Test and Set(TAS)

TAS performs value comparisons and assignments atomically. This value can also be used as a flag to implement conditional branching atomically.
The following is an example of TAS pseudo code written in Rust:


fn test_and_set(v: &mut bool) -> bool {
    if *v {
        true
    } else {
        *v = true;
        false
    }
}

C usually has a built-in compiler function called __sync_lock_test_and_set(bool *v) as TAS. Note that the flags set by this function can be released by __sync_lock_release(bool *v) (simply assign false to v). As you can see, C has no types, just compiler built-in functions.

Rust, on the other hand, has dedicated types. For example, AtomicUsize for usize, AtomicBool for bool, and so on. Moreover, these Atomic types have atomic operations.

Of course, for each Atomic type, there is compare_exchange(), which is equivalent to the C compiler built-in function __sync_lock_test_and_set(bool *v), and store(), which is equivalent to __sync_lock_release(). See the AtomicBool documentation.

Now, let's implement a simple spinlock using AtomiBool.

Simple Spinlock

Let's start with a bad example. Suppose the following Mutex structure to represent a spinlock:


pub struct Mutex<T> {
    locked: UnsafeCell<bool>, // Is the lock held?
    data: UnsafeCell<T>, // actual data
}
pub struct MutexGuard<'a, T: 'a> {
    mutex: &'a Mutex<T>,
}

The important field in this structure is locked, which has the value false if the lock is available and true if it is in use.

Note that UnsafeCell, which wraps locked, provides unconstrained internal variability, but we only use it here because it is difficult to code a bad example in Rust language. See documentation if you are interested.
UnsafeCell also wraps the data, but this one can be guaranteed safe by locked on correct implementation.

The following method can be used to acquire a lock:


impl<T> Mutex<T> {
    pub fn lock(&self) -> MutexGurad<'_, T> {
        let lk = unsafe { &mut *self.locked.get() };
        loop {
            if !*lk {
                *lk = true;
                break MutexGuard {
                    mutex: self
                }
            }
        }
    }
}

It waits in a loop until acquiring the lock. When a lock is acquired, it returns a structure called MutexGuard which allows access to the internal data by deref() and deref_mut(). Then, as long as MutexGuard is alive, the lock is maintained.

This implementation seems to work well, but unfortunately, mutual exclusion cannot be guaranteed in a multiprocess.
Two CPUs may reach the conditional part of if simultaneously, find that lk is false, assign true to lk in the following line, and both CPUs acquire the lock. Both CPUs have acquired the lock at that point, and the mutual exclusion is broken.

To improve this, we need to execute the if conditional branch and the true assignment operation as atomic (i.e., non-separable) steps. This is where we can use the AtomicBool type mentioned earlier and its method compare_exchange():


#[repr(C, align(1))]
pub struct AtomicBool { /* fields omitted */ }
pub fn compare_exchange(
    &self,
    current: bool,
    new: bool,
    success: Ordering,
    failure: Ordering
) -> Result<bool, bool>

This function internally uses amoswap, a RISC-V atomic instruction, to write the value of new if the current value is the same as current, and the return value indicates whether the new value has been written or not.

A correct Mutex implementation would look like this:


pub struct Mutex<T> {
    locked: AtomicBool, // Is the lock held?
    data: UnsafeCell<T>, // actual data
}
impl<T> Mutex<T> {
    pub fn lock(&self) -> MutexGurad<'_, T> {
        loop {
            if !self.locked.compare_excange(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {
                break MutexGuard {
                    mutex: self
                }
            }
        }
    }
}

This implementation allows atomic checking and the setting of values. Moreover, it overcomes the bad example in which checking and set values could not be controlled exclusively since they consisted of multiple operations.

For unlocking, the drop trait can be implemented for MutexGuard as follows to automatically unlock it when MutexGuard goes out of scope.


impl<'a, T: 'a> Drop for MutexGuard<'a, T> {
    fn drop(&mut self) {
        self.mutex.locked.store(false, Ordering::Release);
    }
}

A conservative spinlock

There is a spinlock that disables the interrupts when acquiring a lock and enables it when releasing all locks to prevent the possibility of a deadlock caused by interruptions. We will call this a conservative spinlock, and let's see how to implement it.

First, if we think of both disabling and enabling interrupts as locking the interrupt function of the CPU, we can think of the structure as an "interrupt lock" on the CPU. Note that once all the "interrupt locks" are released, the interruptions are re-enabled, so we need to remember the number of locks to handle the nested critical section.

So, I came up with a Rust-like interface for this "interrupt lock", which disables interrupts with lock(), remembers the number of times it has been disabled, and enables interrupts if the number of times it has been disabled is zero when drop().

Since disabling interrupts is something we're going to do for the CPU, we'll use the Cpu structure to manage this:


pub struct Cpus([UnsafeCell<Cpu>; NCPU]);
// Per-CPU state
pub struct Cpu {
    pub noff: UnsafeCell<isize>, // Depth of interrupts off
    pub intena: bool,            // Were interrupts enabled before interrupts off.?
}

Since the target is a multiprocessor, there are multiple CPU cores. The Cpu structure manages each core, and the Cpus structure manages the whole CPU. Note that noff of Cpu holds the number of times turning off the interrupts, and intena holds the status of whether interrupts were enabled or not before they were turned off.

And think of the following IntrLock as a MutexGurad in Mutex. That is, when we acquire an "interrupt lock" on the CPU, we can get this structure, and as long as one of these structures is alive, interrupts will be disabled. Conversely, when none of these structures exist, interruptions will be enabled.


pub struct IntrLock<'a> {
    cpu: &'a Cpu,
}

noff manages the number of IntrLock and lock() and drop() can increase or decrease it.

The implementation of "interrupt lock" looks like the following. When we call intr_lock() on Cpus, we get the Cpu structure of the CPU where the process is currently running and disable interrupts. Then, calling lock() on Cpu will update the number of times interrupts are disabled and get IntrLock. When IntrLock is dropped, it calls Cpu.unlock() to subtract the number of interrupts disabled, and if the number of disables is zero, it enables the interrupt. In other words, if even one IntrLock is present, the interrupt is disabled.


impl Cpus {
    // Return the pointer this Cpus's Cpu struct.
    // interrupts must be disabled.
    pub unsafe fn mu_cpu(&self) -> &mut Cpu {
        let id = Self::cpu_id();
        &mut *self.0[id].get()
    }

    // intr_lock() disable interrupts on mycpu().
    pub fn intr_lock(&self) -> IntrLock {
        let old = intr_get(); // get status of interrupts
        intr_off(); // disable interrupts
        unsafe { self.my_cpu().lock(old) }
    }


impl Cpu {
    // interrupts must be disabled.
    unsafe fn lock(&mut self, old: bool) -> IntrLock {
        if *self.noff.get() == 0 {
            self.intena = old;
        }
        *self.noff.get() += 1;
        IntrLock { cpu: self }
    }
    // interrupts must be disabled.
    unsafe fn unlock(&self) {
        assert!(!intr_get(), "unlock - interruptible");
        let noff = self.noff.get();
        assert!(*noff >= 1, "unlock");
        *noff -= 1;
        if *noff == 0 && self.intena {
            intr_on()
        }
    }
}


impl<'a> Drop for IntrLock<'a> {
    fn drop(&mut self) {
        unsafe {
            self.cpu.unlock();
        }
    }
}

In the above code, you can disable interrupts by calling intr_lock() on Cpus. You can also control the enabling of interrupts by dropping them so that you don't forget to allow them to after disabling them.

Implementing a conservative spinlock is accessible by using the "interrupt lock" above.
Acquire IntrLock in the process of acquiring a lock with Mutex, and let MutexGuard have IntrLock. This way, interruptions will be disabled when the spinlock exists and enabled when releasing all spinlocks with the drop().


pub struct Mutex<T> {
    locked: AtomicBool, // Is the lock held?
    data: UnsafeCell<T>,    // actual data
}


pub struct MutexGuard<'a, T: 'a> {
    mutex: &'a Mutex<T>,
    _intr_lock: IntrLock<'a>,
}


impl<T> Mutex<T> {
    pub fn lock(&self) -> MutexGurad<'_, T> {
        let _intr_lock = CPUS.intr_lock(); // disable interrupts to avoid deadlock.
        loop {
            if !self.locked.compare_excange(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {
                break MutexGuard {
                    mutex: self
                    _intr_lock,
                }
            }
        }
    }
}

It is important to call intr_lock() before compare_excange(). If the order is different, there would be a brief window when the lock was held with interrupts enabled, and an unfortunately timed interrupt would deadlock the system. Similarly, it is important to enabling interrupts after store(), but don't worry, this will always be the right time due to Rust's drop mechanism.

OS Development with Rust

If you implement a spinlock in C that also disables interrupts, there is always the danger of forgetting to release the lock and enable interrupts. For example, intr_lock() is called in places other than the spinlock, but C doesn't do the post-processing, so you have to call a function called intr_unlock() every time. When you develop an OS with Rust, you won't face such problems. So, I think Rust is suitable for OS development because of its various other advantages.

You can find the OS code I am implementing, including the complete implementation of the spinlock at this repo.